Person Re-Identification (person re-id) is a crucial task as its applicationsin visual surveillance and human-computer interaction. In this work, we presenta novel joint Spatial and Temporal Attention Pooling Network (ASTPN) forvideo-based person re-identification, which enables the feature extractor to beaware of the current input video sequences, in a way that interdependency fromthe matching items can directly influence the computation of each other'srepresentation. Specifically, the spatial pooling layer is able to selectregions from each frame, while the attention temporal pooling performed canselect informative frames over the sequence, both pooling guided by theinformation from distance matching. Experiments are conduced on the iLIDS-VID,PRID-2011 and MARS datasets and the results demonstrate that this approachoutperforms existing state-of-art methods. We also analyze how the jointpooling in both dimensions can boost the person re-id performance moreeffectively than using either of them separately.
展开▼